Generalised Additive Models (GAMs) are incredibly flexible tools that have found particular application in the analysis of time series. In ecology, a host of recent papers and workshops (i.e. the 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross) have drawn special attention to the wide range of applications that GAMs have for addressing complex ecological problems. Given the many ways that GAMs can model temporal data, it is tempting to use the smooth functions estimated by a GAM to produce out of sample forecasts. Here we will inspect the behaviours of smoothing splines when extrapolating to data outside the range of the training data to examine whether this can be useful in practice. First we load the AirPassengers data from the forecast package and convert to an xts object. This series is a good starting point as it should be highly forecastable given its stable seasonal pattern and nearly linear trend. We will work in the mvgam package, which fits GAMs using MCMC sampling via the JAGS software (Note that JAGS 4.3.0 is required; installation links are found here)
# devtools::install_github('nicholasjclark/mvgam')
library(mvgam)
library(dplyr)
library(xts)
library(forecast)
data("AirPassengers")
series <- xts::as.xts(floor(AirPassengers))
colnames(series) <- c("Air")
View the raw series, its STL decomposition and the distribution of observations. There is a clear seasonal pattern as well as an increasing trend over time for this series, and the distribution shows evidence of a skew suggestive of overdispersion
plot(series)
plot(stl(series, s.window = "periodic"))
hist(series)
Next use the series_to_mvgam function, which converts ts or xts objects into the correct format for mvgam. Here we set train_prop = 0.75, meaning we will include ~ 75% of the observations in the training set and use the remaining ~25% for forecast validation. We also randomly set ~10% of observations to NA so that we can evaluate models based on their predictive abilities
series[sample(1:length(series), floor(length(series)/10),
F)] <- NA
fake_data <- series_to_mvgam(series, freq = 12,
train_prop = 0.75)
Examine the returned object
head(fake_data$data_train)
head(fake_data$data_test)
As a first pass at modelling this series, we will fit a GAM that includes a seasonal smooth and a yearly smooth. As the seasonality is cyclic, we will use a cyclic cubic regression spline for season. The knots are set so that the boundaries of the cyclic smooth match up between December 31 and January 1. We will stick with the default thin plate regression spline for year. This is similar to what we might do when fitting a model in mgcv to try and forecast ahead, except here we also have an explicit model for the residual component. mvgam uses the jagam function from the mgcv package to generate a skeleton JAGS file and updates that file to incorporate any dynamic trend components (so far limited to no trend, random walks or AR models up to order 3). This is advantageous as any GAM that is allowed in mgcv can in principle be used in mvgam to fit dynamic linear models with smooth functions for nonlinear covariate effects. For multivariate series, mvgam also includes an option for dimension reduction by inducing trend correlations via a dynamic factor process. Here we use the Negative Binomial family and a burnin length of 2000 iterations (in practice we would probably run these models for a bit longer, but they tend to converge rather quickly for univariate series thanks to the useful starting values supplied by jagam). Note that feeding the data_test object does not mean that these values are used in training of the model. Rather, they are included as NA so that we can automatically create a forecast from the posterior predictions for these observations. This is useful for plotting forecasts without needing to run new data through the model’s equations later on
mod1 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(season, bs = c("cc"), k = 12) +
s(year, k = 5), knots = list(season = c(0.5,
12.5)), family = "nb", trend_model = "None",
n.burnin = 2000, auto_update = F)
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 96
## Unobserved stochastic nodes: 344
## Total graph size: 4209
##
## Initializing model
We can view the JAGS model file that has been generated to see the additions that have been made to the base gam model. If the user selects return_jags_data = TRUE when calling mvjagam, this file can be modified and the resulting jags_data object can also be modified to fit more complex Bayesian models. Note that here the AR and phi terms have been set to zero as this model does not include a dynamic trend component
mod1$model_file
## [1] "model {"
## [2] ""
## [3] "## GAM linear predictor"
## [4] "eta <- X %*% b"
## [5] ""
## [6] "## Mean expectations"
## [7] "for (i in 1:n) {"
## [8] " for (s in 1:n_series) {"
## [9] " mu[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s])"
## [10] " }"
## [11] "}"
## [12] ""
## [13] "## State space trends"
## [14] "for(s in 1:n_series) {"
## [15] " trend[1, s] ~ dnorm(0, tau[s])"
## [16] "}"
## [17] ""
## [18] "for(s in 1:n_series) {"
## [19] " trend[2, s] ~ dnorm(phi[s] + ar1[s]*trend[1, s], tau[s])"
## [20] "}"
## [21] ""
## [22] "for(s in 1:n_series) {"
## [23] " trend[3, s] ~ dnorm(phi[s] + ar1[s]*trend[2, s] +"
## [24] " ar2[s]*trend[1, s], tau[s])"
## [25] "}"
## [26] ""
## [27] "for (i in 4:n){"
## [28] " for (s in 1:n_series){"
## [29] " trend[i, s] ~ dnorm(phi[s] + ar1[s]*trend[i - 1, s] +"
## [30] " ar2[s]*trend[i - 2, s] +"
## [31] " ar3[s]*trend[i - 3, s], tau[s])"
## [32] " }"
## [33] "}"
## [34] ""
## [35] "## AR components"
## [36] "for (s in 1:n_series){"
## [37] " phi[s] <- 0"
## [38] " ar1[s] <- 0"
## [39] " ar2[s] <- 0"
## [40] " ar3[s] <- 0"
## [41] " tau[s] ~ dgamma(0.01, 0.001)"
## [42] "}"
## [43] ""
## [44] "## Negative binomial likelihood functions"
## [45] "for (i in 1:n) {"
## [46] " for (s in 1:n_series) {"
## [47] " y[i, s] ~ dnegbin(rate[i, s], r)"
## [48] " rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps,"
## [49] " (r / (r + mu[i, s])))"
## [50] " }"
## [51] "}"
## [52] "r ~ dexp(0.001)"
## [53] ""
## [54] "## Posterior predictions"
## [55] "for (i in 1:n) {"
## [56] " for (s in 1:n_series) {"
## [57] " ypred[i, s] ~ dnegbin(rate[i, s], r)"
## [58] " }"
## [59] "}"
## [60] ""
## [61] " ## Parametric effect priors CHECK tau=1/54^2 is appropriate!"
## [62] "for (i in 1:1) { b[i] ~ dnorm(0, 0.25) }"
## [63] " ## prior for s(season)... "
## [64] " K1 <- S1[1:10,1:10] * lambda[1] "
## [65] " b[2:11] ~ dmnorm(zero[2:11],K1) "
## [66] " ## prior for s(year)... "
## [67] " K2 <- S2[1:4,1:4] * lambda[2] + S2[1:4,5:8] * lambda[3]"
## [68] " b[12:15] ~ dmnorm(zero[12:15],K2) "
## [69] " ## smoothing parameter priors CHECK..."
## [70] " for (i in 1:3) {"
## [71] " lambda[i] ~ dgamma(.05,.005)"
## [72] " rho[i] <- log(lambda[i])"
## [73] " }"
## [74] "}"
Inspect the summary from the model, which is somewhat similar to an mgcv model summary with extra information about convergences for unobserved parameters. The estimated degrees of freedom, smooth coefficients and smooth penalties are all extracted from the mvgam model using sim2jam so that approximate p-values can be calculated using Nychka’s method (following Wood (2013) Biometrika 100(1), 221-228). Note however that this function is still in development and approximate p-values may not be entirely accurate
summary_mvgam(mod1)
## GAM formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5)
##
## Family:
## Negative Binomial
##
## N series:
## 1
##
## N observations per series:
## 108
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 1052.638 2815.232 6926.49 1 1088
##
## GAM smooth term approximate significances:
## edf Ref.df Chi.sq p-value
## s(season) 8.548 10.000 455.800 <2e-16 ***
## s(year) 2.847 4.000 0.163 0.692
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 5.35931647 5.374799131 5.39013033 1.00 972
## s(season).1 -0.23171006 -0.173026763 -0.09744477 1.10 56
## s(season).2 -0.13853671 -0.093439501 -0.03784733 1.06 84
## s(season).3 -0.05126344 0.005636679 0.06958161 1.01 68
## s(season).4 -0.05800030 -0.017102998 0.03097099 1.02 125
## s(season).5 0.04073898 0.080000531 0.12292576 1.00 111
## s(season).6 0.19717598 0.239344081 0.29444988 1.01 110
## s(season).7 0.17312409 0.215388256 0.25998185 1.01 113
## s(season).8 0.01432368 0.064057423 0.12402478 1.03 73
## s(season).9 -0.19040686 -0.132763434 -0.07509330 1.05 82
## s(season).10 -0.17493832 -0.108031633 -0.03896809 1.02 64
## s(year).1 -0.08155003 -0.023432187 0.02218597 1.07 60
## s(year).2 -0.10577890 -0.012393099 0.04032087 1.23 57
## s(year).3 -0.10094135 -0.003860495 0.07271641 1.17 46
## s(year).4 0.32730378 0.370630548 0.42744627 1.09 59
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 4.505168 5.692319 6.727065 1.00 213
## s(year) 1.429274 4.146457 6.084831 1.09 103
## s(year)2 -5.053140 1.273977 3.636683 1.00 441
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0 0 0 NaN 0
## ar1 0 0 0 NaN 0
## ar2 0 0 0 NaN 0
## ar3 0 0 0 NaN 0
##
mvgam takes a fitted gam model and adapts the model file to fit in JAGS, with possible extensions to deal with stochastic trend components and other features. But as this model has not been modified much from the original gam model with the same formula (i.e. we have not added any stochastic trend components to the linear predictor yet, which is the main feature of mvgam), the summary of the unmodified gam model is also useful and fairly accurate. Note however that this base model will always be Poisson distributed because the jagam function does not currently support Negative Binomial distributions.
summary(mod1$mgcv_model)
##
## Family: poisson
## Link function: log
##
## Formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.361724 0.007207 743.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(season) 7.73 10 361.9 <2e-16 ***
## s(year) 1.00 1 2342.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.986 Deviance explained = 98.6%
## fREML = 135.04 Scale est. = 1 n = 96
Ordinarily we would be quite pleased with this result, as we have explained most of the variation in the series with a fairly simple model. We can plot the estimated smooth functions and with their associated credible intervals, which are interpreted similarly to mgcv plots except they are not centred at zero (mvgam predicts the smooth with all other covariates at their means, rather than at zero). So the plot below, for the season smooth, shows the estimated values for the linear predictor (on the log scale in this case) if year was set to its mean
plot_mvgam_smooth(mod1, series = 1, smooth = "season")
And here is the plot of the smooth function for year, which has essentially estimated a straight line
plot_mvgam_smooth(mod1, series = 1, smooth = "year")
Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine simulated kernel densities for posterior predictions (yhat) and compare to the density of the observations (y)
plot_mvgam_ppc(mod1, series = 1, type = "density")
Now plot the distribution of predicted means compared to the observed mean
plot_mvgam_ppc(mod1, series = 1, type = "mean")
Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (yhat) and compare to the CDF of the observations (y)
plot_mvgam_ppc(mod1, series = 1, type = "cdf")
Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform
plot_mvgam_ppc(mod1, series = 1, type = "pit")
All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Now for some investigation of the estimated relationships and forecasts. We can also perform residual diagnostics using randomised quantile (Dunn-Smyth) residuals. These look reasonable overall, though there is some autocorrelation left in the residuals for this series
plot_mvgam_resids(mod1, series = 1)
Ok so the model is doing well when fitting against the training data, but how are its forecasts? The yearly trend is being extrapolated into the future, which controls most of the shape and uncertainty in the forecast. We see there is a reasonable estimate of uncertainty and the out of sample observations (to the right of the dashed line) are all within the model’s 95% HPD intervals
plot_mvgam_fc(mod1, series = 1, data_test = fake_data$data_test)
The extrapolation of the smooth for year can be better viewed by feeding new data to the plot_mvgam_smooth function. Here we feed values of year to cover the training and testing set to see how the extrapolation would continue into the future. The dashed line marks the end of the training period
plot_mvgam_smooth(mod1, series = 1, "year",
newdata = expand.grid(year = seq(min(fake_data$data_train$year),
max(fake_data$data_test$year), length.out = 500),
season = mean(fake_data$data_train$season),
series = unique(fake_data$data_train$series)))
abline(v = max(fake_data$data_train$year),
lty = "dashed")
We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values
plot_mvgam_ppc(mod1, series = 1, type = "density",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod1, series = 1, type = "mean",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod1, series = 1, type = "cdf",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod1, series = 1, type = "pit",
data_test = fake_data$data_test)
There are some very clear problems with the way this model is generating future predictions. Perhaps a different smooth function for year can help? Here we fit our original model again but use a different smooth term for year to try and capture the long-term trend (using B splines with multiple penalties, following the excellent example by Gavin Simpson about extrapolating with smooth terms). This is similar to what we might do when trying to forecast ahead from a more wiggly function, as B splines have useful properties by allowing the penalty to be extended into the range of values we wish to predict (in this case, the years in data_test)
mod2 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(season, bs = c("cc"), k = 12) +
s(year, bs = "bs", m = c(3, 2,
1, 0)), knots = list(season = c(0.5,
12.5), year = c(min(fake_data$data_train$year) -
1, min(fake_data$data_train$year),
max(fake_data$data_train$year), max(fake_data$data_test$year))),
family = "nb", trend_model = "None",
n.burnin = 2000, auto_update = F)
## Warning in smooth.construct.bs.smooth.spec(object, dk$data, dk$knots): there is
## *no* information about some basis coefficients
## Warning in smooth.construct.bs.smooth.spec(object, dk$data, dk$knots): basis
## dimension is larger than number of unique covariates
## Warning in smooth.construct.bs.smooth.spec(object, dk$data, dk$knots): there is
## *no* information about some basis coefficients
## Warning in smooth.construct.bs.smooth.spec(object, dk$data, dk$knots): basis
## dimension is larger than number of unique covariates
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 96
## Unobserved stochastic nodes: 345
## Total graph size: 5149
##
## Initializing model
Again as we haven’t modified the base gam much, the summary from the mgcv model is fairly accurate
summary_mvgam(mod2)
## GAM formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3,
## 2, 1, 0))
##
## Family:
## Negative Binomial
##
## N series:
## 1
##
## N observations per series:
## 108
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 1178.5 2823.615 6620.595 1 1400
##
## GAM smooth term approximate significances:
## edf Ref.df Chi.sq p-value
## s(season) 8.559 10.000 497.6 <2e-16 ***
## s(year) 4.863 9.000 896.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 5.36005894 5.37503545 5.39036724 1.00 1243
## s(season).1 -0.24015587 -0.17086381 -0.10315605 1.04 58
## s(season).2 -0.14449938 -0.09361737 -0.03126520 1.02 70
## s(season).3 -0.04316788 0.01534325 0.07470943 1.07 82
## s(season).4 -0.05867187 -0.01779126 0.02696013 1.00 108
## s(season).5 0.02913474 0.07589775 0.11717103 1.00 106
## s(season).6 0.20383428 0.24241902 0.29588539 1.07 124
## s(season).7 0.16757319 0.21295929 0.25669624 1.01 115
## s(season).8 0.02563166 0.06919195 0.11494449 1.02 92
## s(season).9 -0.19612418 -0.13334174 -0.08394330 1.02 73
## s(season).10 -0.18394484 -0.11359154 -0.04739631 1.02 59
## s(year).1 -1.73729194 -1.15683069 -0.57529641 3.29 10
## s(year).2 -0.44785284 -0.31404320 -0.17854895 1.37 50
## s(year).3 -0.17765978 -0.07247616 0.02665727 1.05 63
## s(year).4 0.16109691 0.27827800 0.40392076 1.70 38
## s(year).5 0.23509973 0.34282601 0.43948082 1.05 53
## s(year).6 0.54530053 0.68789588 0.81483518 1.66 43
## s(year).7 0.66229347 0.78233451 0.88370179 1.07 59
## s(year).8 0.13776754 0.71101428 1.17655661 1.59 12
## s(year).9 -0.10238448 0.96154376 1.64212528 2.24 5
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 4.331802 5.6391945 6.624445 1.02 151
## s(year) 3.295675 5.2913875 6.733248 1.02 104
## s(year)2 -42.183234 0.3819474 3.449432 1.78 22
## s(year)3 -34.981919 -7.7530711 1.275780 1.84 16
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0 0 0 NaN 0
## ar1 0 0 0 NaN 0
## ar2 0 0 0 NaN 0
## ar3 0 0 0 NaN 0
##
summary(mod2$mgcv_model)
##
## Family: poisson
## Link function: log
##
## Formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3,
## 2, 1, 0))
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.361772 0.007207 744 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(season) 7.7295 10 362 <2e-16 ***
## s(year) 0.9996 9 2341 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.986 Deviance explained = 98.6%
## fREML = 135.37 Scale est. = 1 n = 96
This model explains even more of the variation than the thin plate yearly model above, so we’d be tempted to use it for prediction (though of course we’d want to perform a series of checks following advice from Simon Wood; see lectures by Gavin Simpson for more information on how to perform these checks). So how do the forecasts look?
plot_mvgam_fc(mod2, series = 1, data_test = fake_data$data_test)
Again the forecast is being driven primarily by the extrapolation behaviour of the B spline. Look at this behaviour for the year smooth as we did above by feeding new data for prediction
plot_mvgam_smooth(mod2, series = 1, "year",
newdata = expand.grid(year = seq(min(fake_data$data_train$year),
max(fake_data$data_test$year), length.out = 500),
season = mean(fake_data$data_train$season),
series = unique(fake_data$data_train$series)))
abline(v = max(fake_data$data_train$year),
lty = "dashed")
Residual and posterior in-training predictive checks for this model will look similar to the above, with some autocorrelation left in residuals but nothing terribly alarming. But the forecast checks again show some problems:
plot_mvgam_ppc(mod2, series = 1, type = "density",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod2, series = 1, type = "mean",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod2, series = 1, type = "cdf",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod2, series = 1, type = "pit",
data_test = fake_data$data_test)
Can we improve the forecasts by removing our reliance on extrapolation? Now we will fit a model in which the GAM component of the linear predictor captures the repeated seasonality (again with a cyclic smooth) and a dynamic latent trend captures the residual process using AR parameters (up to order 3). This model is a mildly modified version of the base mgcv model where the linear predictor is augmented with the latent trend component. Slightly longer burnin is used here due to the added complexity of the time series component, but the model still fits in ~ 30 seconds on most machines
mod3 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(season, bs = c("cc"), k = 12),
knots = list(season = c(0.5, 12.5)),
family = "nb", trend_model = "AR3", n.burnin = 3000,
auto_update = F)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 96
## Unobserved stochastic nodes: 345
## Total graph size: 3874
##
## Initializing model
In this case the fitted model is more different to the base mgcv model that was used in jagam to produce the skeleton JAGS file, so the summary of that base model is less accurate. But we can still check the model summary for the mvjagam mdoel to examine convergence for key parameters
summary_mvgam(mod3)
## GAM formula:
## y ~ s(season, bs = c("cc"), k = 12)
##
## Family:
## Negative Binomial
##
## N series:
## 1
##
## N observations per series:
## 108
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 1207.283 2946.139 6636.348 1 1453
##
## GAM smooth term approximate significances:
## edf Ref.df Chi.sq p-value
## s(season) 7.766 10.000 410.2 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 4.686684011 4.74861532 4.79332178 1.06 14
## s(season).1 -0.188872004 -0.08798572 -0.03161602 1.22 72
## s(season).2 -0.111614196 -0.03974986 0.01979238 1.13 70
## s(season).3 -0.004237354 0.05195922 0.11222878 1.09 84
## s(season).4 -0.044271735 0.00167653 0.04897618 1.01 120
## s(season).5 0.044058633 0.08766032 0.12788149 1.00 107
## s(season).6 0.192644451 0.23286117 0.27461934 1.02 131
## s(season).7 0.161392726 0.20034195 0.23586815 1.06 146
## s(season).8 -0.005829383 0.04046296 0.08769093 1.02 107
## s(season).9 -0.225889484 -0.18483286 -0.13684293 1.01 109
## s(season).10 -0.240968072 -0.17307924 -0.11784684 1.07 72
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 4.696311 5.931051 6.885914 1.06 442
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0.01379049 0.02578072 0.03990746 1.01 264
## ar1 -0.09112836 0.30335185 0.68624604 1.01 259
## ar2 -0.10229781 0.33283169 0.73560907 1.01 287
## ar3 -0.03101280 0.37621187 0.72944306 1.01 265
##
The seasonal term is obviously still very important. Plot it here
plot_mvgam_smooth(mod3, series = 1, "season")
Plot diagnostics of posterior median Dunn-Smyth residuals, where the autocorrelation is now captured by the latent trend process
plot_mvgam_resids(mod3, series = 1)
How does the model’s posterior forecast distribution compare to the previous models?
plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "density",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "mean",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "cdf",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "pit",
data_test = fake_data$data_test)
None of the model’s is a perfect representation of the data generating process, but the forecast for our dynamic GAM is more stable and more accurate than the previous models, with the added advantage that we can place more trust our estimated smooth for season because we have captured the residual autocorrelation. The posterior checks for our dynamic GAM also look much better than the two previous models. Plot the estimated latent dynamic trend (which is on the log scale)
plot_mvgam_trend(mod3, series = 1, data_test = fake_data$data_test)
Benchmarking against “null” models is a very important part of evaluating a proposed forecast model. After all, if our complex dynamic model can’t generate better predictions then a random walk or mean forecast, is it really telling us anything new about the data-generating process? Here we examine the model comparison utilities in mvgam. Here we illustrate how this can be done in mvgam by fitting a simpler model by smoothing on a white noise covariate rather than on the seasonal variable. Because the white noise covariate is not informative and we are using a random walk for the trend process, this model essentially becomes a Poisson observation model over a dynamic random walk process.
fake_data$data_train$fake_cov <- rnorm(NROW(fake_data$data_train))
fake_data$data_test$fake_cov <- rnorm(NROW(fake_data$data_test))
mod4 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(fake_cov, k = 3), family = "poisson",
trend_model = "RW", n.burnin = 3000,
auto_update = F)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 96
## Unobserved stochastic nodes: 342
## Total graph size: 2484
##
## Initializing model
Look at this model’s proposed forecast
plot_mvgam_fc(mod4, series = 1, data_test = fake_data$data_test)
Here we showcase how different dynamic models can be compared using rolling probabilistic forecast evaluation, which is especially useful if we don’t already have out of sample observations for comparing forecasts. This function sets up a sequence of evaluation timepoints along a rolling window within the training data to evaluate ‘out-of-sample’ forecasts. The trends are rolled forward a total of fc_horizon timesteps according to their estimated state space dynamics to generate an ‘out-of-sample’ forecast that is evaluated against the true observations in the horizon window. We are therefore simulating a situation where the model’s parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts that consider the possible future paths for the latent trends and the true observed values for any other covariates in the horizon window. Evaluation involves calculating the Discrete Rank Probability Score and a binary indicator for whether or not the true value lies within the forecast’s 90% prediction interval. For this test we compare the two models on the exact same sequence of 30 evaluation points using horizon = 6
compare_mvgams(model1 = mod3, model2 = mod4,
fc_horizon = 6, n_evaluations = 30, n_cores = 3)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Model 1 2.559179 4.061879 4.97326 5.692219 6.162038 20.57485 18
## Model 2 4.385619 10.518443 16.75650 22.444538 27.558161 95.36302 18
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 1
## Model 2 0.9691358
The series of plots generated by compare_mvgams clearly show that the first dynamic model generates better predictions. In each plot, DRPS for the forecast horizon is lower for the first model than for the second model. This kind of evaluation is often more appropriate for forecast models than complexity-penalising fit metrics such as AIC or BIC However, comparing forecasts of the dynamic models against the two models with yearly smooth terms (mod1 and mod2) using the rolling window approach is actually not recommended, as the yearly smooth models have already seen all the possible in-sample values of year and so should be able to predict incredibly well by interpolating through the range of the fitted smooth. By contrast, the dynamic component in our dynamic GAMs (models 3 and 4) produce true forecasts when running the rolling window approach. Nevertheless, when we compare some of these models (here mod1 with the thin plate yearly smooth vs mod3 with the AR3 trend process) as we did above for the random walk model, we still find that our dynamic GAM produces superior probabilistic forecasts
compare_mvgams(model1 = mod1, model2 = mod3,
fc_horizon = 6, n_evaluations = 30, n_cores = 3)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Model 1 2.616286 4.509835 5.939974 7.154144 8.444310 22.62026 18
## Model 2 2.603351 4.020251 4.915574 5.667138 6.084009 21.04441 18
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 0.9876543
## Model 2 1
The same holds true when comparing against the B spline model (mod2)
compare_mvgams(model1 = mod2, model2 = mod3,
fc_horizon = 6, n_evaluations = 30, n_cores = 3)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Model 1 2.764191 5.266911 7.354980 8.796235 11.759042 26.91656 18
## Model 2 2.714219 4.050906 5.019303 5.663657 6.098369 21.29145 18
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 0.9814815
## Model 2 1
Now we proceed by exploring how forecast distributions from an mvgam object can be automatically updated in light of new incoming observations. This works by generating a set of “particles” that each captures a unique proposal about the current state of the system (in this case, the current estimate of the latent trend component). The next observation in data_assim is assimilated and particles are weighted by how well their proposal (i.e. their proposed forecast, prior to seeing the new data) matched the new observations. For univariate models such as the ones we’ve fitted so far, this weight is represented by the proposal’s Negative Binomial log-likelihood. For multivariate models, a multivariate composite likelihood is used for weights. Once weights are calculated, we use importance sampling to update the model’s forecast distribution for the remaining forecast horizon. Begin by initiating a set of 5000 particles by assimilating the next observation in data_test and storing the particles in the default location (in a directory called particles within the working directory)
pfilter_mvgam_init(object = mod3, n_particles = 5000,
n_cores = 2, data_assim = fake_data$data_test)
## Saving particles to pfilter/particles.rda
## ESS = 32.83125
Now we are ready to run the particle filter. This function will assimilate the next six out of sample observations in data_test and update the forecast after each assimilation step. This works in an iterative fashion by calculating each particle’s weight, then using a kernel smoothing algorithm to “pull” low weight particles toward the high-likelihood space before assimilating the next observation. The strength of the kernel smoother is controlled by kernel_lambda, which in our experience works well when left to the default of 1. If the Effective Sample Size of particles drops too low, suggesting we are putting most of our belief in a very small set of particles, an automatic resampling step is triggered to increase particle diversity and reduce the chance that our forecast intervals become too narrow and incapable of adapting to changing conditions
pfilter_mvgam_online(data_assim = fake_data$data_test[1:7,
], n_cores = 2, kernel_lambda = 1)
## Particles have already assimilated one or more observations. Skipping these
##
## Assimilating the next 6 observations
##
## Effective sample size is 8.44879 ...
##
## Smoothing particles ...
##
## Effective sample size is 121.5226 ...
##
## Smoothing particles ...
##
## Effective sample size is 451.9766 ...
##
## Smoothing particles ...
##
## Effective sample size is 2275.511 ...
##
## Smoothing particles ...
##
## Effective sample size is 4030.958 ...
##
## Smoothing particles ...
##
## Effective sample size is 4430.902 ...
##
## Smoothing particles ...
##
## Last assimilation time was 7 1958
##
## Saving particles to pfilter/particles.rda
## ESS = 4430.902
Once assimilation is complete, generate the updated forecast from the particles using the covariate information in remaining data_test observations. This function is designed to hopefully make it simpler to assimilate observations, as all that needs to be provided once the particles are initiated as a dataframe of test data in exactly the same format as the data that were used to train the initial model. If no new observations are found (observations are arranged by year and then by season so the consistent indexing of these two variables is very important!) then the function returns a NULL and the particles remain where they are in state space.
fc <- pfilter_mvgam_fc(file_path = "pfilter",
n_cores = 2, data_test = fake_data$data_test,
ylim = c(min(fake_data$data_train$y,
na.rm = T), max(fake_data$data_test$y,
na.rm = T) * 1.25))
Compare the updated forecast to the original forecast to see how it has changed in light of the most recent observations
par(mfrow = c(1, 2))
plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test,
ylim = c(min(fake_data$data_train$y,
na.rm = T), max(fake_data$data_test$y,
na.rm = T) * 1.25))
fc$Air()
Here it is apparent that the distribution has shifted slightly in light of the 6 observations that have been assimilated, and that our confidence in the remaining forecast horizon has improved (tighter uncertainty intervals). This is an advantageous way of allowing a model to slowly adapt to new conditions while breaking free of restrictive assumptions about residual distributions. See some of the many particle filtering lectures by Nathaniel Osgood for more details. Remove the particles from their stored directory when finished
unlink("pfilter", recursive = T)